342 research outputs found
The Use of Optimal Cue Mapping to Improve the Intelligibility and Quality of Speech in Complex Binaural Sound Mixtures.
A person with normal hearing has the ability to follow a particular conversation of interest in a noisy and reverberant environment, whilst simultaneously ignoring the interfering sounds. This task often becomes more challenging for individuals with a hearing impairment. Attending selectively to a sound source is difficult to replicate in machines, including devices such as hearing aids. A correctly set up hearing aid will work well in quiet conditions, but its performance may deteriorate seriously in the presence of competing sounds. To be of help in these more challenging situations the hearing aid should be able to segregate the desired sound source from any other, unwanted sounds.
This thesis explores a novel approach to speech segregation based on optimal cue mapping (OCM). OCM is a signal processing method for segregating a sound source based on spatial and other cues extracted from the binaural mixture of sounds arriving at a listener's ears. The spectral energy fraction of the target speech source in the mixture is estimated frame-by-frame using artificial neural networks (ANNs). The resulting target speech magnitude estimates for the left and right channels are combined with the corresponding original phase spectra to produce the final binaural output signal. The performance improvements delivered by the OCM algorithm are evaluated using the STOI and PESQ metrics for speech intelligibility and quality, respectively. A variety of increasingly challenging binaural mixtures are synthesised involving up to five spatially separate sound sources in both anechoic and reverberant environments. The segregated speech consistently exhibits gains in intelligibility and quality and compares favourably with a leading, somewhat more complex approach. The OCM method allows the selection and integration of multiple cues to be optimised and provides scalable performance benefits to suit the available computational resources. The ability to determine the varying relative importance of each cue in different acoustic conditions is expected to facilitate computationally efficient solutions suitable for use in a hearing aid, allowing the aid to operate effectively in a range of typical acoustic environments. Further developments are proposed to achieve this overall goal
Tuning Pre-trained Model via Moment Probing
Recently, efficient fine-tuning of large-scale pre-trained models has
attracted increasing research interests, where linear probing (LP) as a
fundamental module is involved in exploiting the final representations for
task-dependent classification. However, most of the existing methods focus on
how to effectively introduce a few of learnable parameters, and little work
pays attention to the commonly used LP module. In this paper, we propose a
novel Moment Probing (MP) method to further explore the potential of LP.
Distinguished from LP which builds a linear classification head based on the
mean of final features (e.g., word tokens for ViT) or classification tokens,
our MP performs a linear classifier on feature distribution, which provides the
stronger representation ability by exploiting richer statistical information
inherent in features. Specifically, we represent feature distribution by its
characteristic function, which is efficiently approximated by using first- and
second-order moments of features. Furthermore, we propose a multi-head
convolutional cross-covariance (MHC) to compute second-order moments in an
efficient and effective manner. By considering that MP could affect feature
learning, we introduce a partially shared module to learn two recalibrating
parameters (PSRP) for backbones based on MP, namely MP. Extensive
experiments on ten benchmarks using various models show that our MP
significantly outperforms LP and is competitive with counterparts at less
training cost, while our MP achieves state-of-the-art performance.Comment: Accepted to ICCV 2023; Project Page:
https://github.com/mingzeG/Moment-Probin
Trainability Analysis of Quantum Optimization Algorithms from a Bayesian Lens
The Quantum Approximate Optimization Algorithm (QAOA) is an extensively
studied variational quantum algorithm utilized for solving optimization
problems on near-term quantum devices. A significant focus is placed on
determining the effectiveness of training the -qubit QAOA circuit, i.e.,
whether the optimization error can converge to a constant level as the number
of optimization iterations scales polynomially with the number of qubits. In
realistic scenarios, the landscape of the corresponding QAOA objective function
is generally non-convex and contains numerous local optima. In this work,
motivated by the favorable performance of Bayesian optimization in handling
non-convex functions, we theoretically investigate the trainability of the QAOA
circuit through the lens of the Bayesian approach. This lens considers the
corresponding QAOA objective function as a sample drawn from a specific
Gaussian process. Specifically, we focus on two scenarios: the noiseless QAOA
circuit and the noisy QAOA circuit subjected to local Pauli channels. Our first
result demonstrates that the noiseless QAOA circuit with a depth of
can be trained efficiently,
based on the widely accepted assumption that either the left or right slice of
each block in the circuit forms a local 1-design. Furthermore, we show that if
each quantum gate is affected by a -strength local Pauli channel with the
noise strength range of to 0.1, the noisy QAOA circuit with
a depth of can also be trained
efficiently. Our results offer valuable insights into the theoretical
performance of quantum optimization algorithms in the noisy intermediate-scale
quantum era
- …